Towards Spam Detection at Ping Servers

نویسندگان

  • Pranam Kolari
  • Timothy W. Finin
  • Akshay Java
  • Anupam Joshi
چکیده

Spam blogs, or splogs feature plagiarized or auto-generated content. They create link farms to promote affiliates, and are motivated by the profitability of hosting ads. Splogs infiltrate the blogosphere at ping servers, systems that aggregate blog update pings. Over the past year, our work has focused on detecting and eliminating splogs. As techniques used by spammers have evolved, we have learned how splog signatures are tied to tools that create them, that they are beginning to be a problem across languages, and that they require a much quicker assessment. Though we continue to address these specific challenges, we discuss our larger goal in this work, of developing a scalable meta-ping filter that detects and eliminates update pings from splogs. This will considerably reduce computational requirements and manual efforts at downstream services (search engines) and involve the community in detecting spam blogs.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards Improving E-mail Content Classification for Spam Control: Architecture, Abstraction, and Strategies

This dissertation discusses techniques to improve the effectiveness and the efficiency of spam control. Specifically, layer-3 e-mail content classification is proposed to allow e-mail pre-classification (for fast spam detection at receiving e-mail servers) and to allow distributed processing at network nodes for fast spam detection at spam control points, e.g., at e-mail servers. Fast spam dete...

متن کامل

Restraining transmission of unsolicited bulk e-mail

Filtering large amounts of unsolicited bulk e-mail, also known as spam, is expensive. Either because of the time spent on manual deletion or complex analysis by algorithms putting heavy load on e-mail servers. Due to exponential growth of the volume of spam campaigns, Internet Service Providers (ISP’s) are increasingly forced to use rigorous rejection policies to prevent their filter servers fr...

متن کامل

Real-time statistical rules for spam detection

Spam detections fall into two categories: rule-based and statistical-based. The former refers to the detection which is performed by looking for spam-liked patterns in an email. Since the rules can be shared, they have been popularized quickly. The rules, however, are built manually it is hard to keep them up with the variation of spam. The statistical-based method, on the other hand, is possib...

متن کامل

Characterizing the Splogosphere

Weblogs or blogs collectively constitute the Blogosphere, forming an influential and interesting subset on the Web. As with most Internet-enabled applications, the ease of content creation and distribution makes the blogosphere spam prone. Spam blogs or splogs are blogs hosting spam posts, created using machine generated or hijacked content for the sole purpose of hosting ads or raising the Pag...

متن کامل

Survey on Text Classification (Spam) Using Machine Learning

E-mail spam is a very serious problem in today’s life. It has many conséquences like it causes lower productivity, occupy space in mail boxes, extend viruses, Trojans, and materials containing potentially harmful information for a certain category of users, Destroy stability of mail servers, and as a result users spend a lot of time for sorting incoming mail and deleting undesirable corresponde...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007